-
Notifications
You must be signed in to change notification settings - Fork 66
feat: taint checking and security #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: taint checking and security #197
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
|
@nrfulton quick clarifications -
|
We should infer automatically where-ever possible. I nthis case, I'm not sure how you would infer taint. I guess you assumption here is that |
|
Yes, that was the thinking. Making it configurable may make more sense for taint. Any thoughts on the best way to expose that? Would love your take on the code too. |
|
If there is a tainted variable in the context, everything downstream should get tainted. As for how variables get tainted in the first place, a common way people do this is to define sources, sinks, and (optionally) washers. These are wrappers around interfaces that produce sensitive data (e.g. HR database api), or where it enters an unsafe place (e.g. sending to a UI). |
4798f64 to
d1b09e1
Compare
Signed-off-by: Ambrish Rawat <[email protected]>
Signed-off-by: Ambrish Rawat <[email protected]>
Signed-off-by: Ambrish Rawat <[email protected]>
Signed-off-by: Ambrish Rawat <[email protected]>
Signed-off-by: Ambrish Rawat <[email protected]>
Signed-off-by: Ambrish Rawat <[email protected]>
Signed-off-by: Ambrish Rawat <[email protected]>
Signed-off-by: Ambrish Rawat <[email protected]>
Signed-off-by: Ambrish Rawat <[email protected]>
d5f8033 to
5466c31
Compare
docs/dev/taint_analysis.md
Outdated
| component = CBlock("user input") | ||
| component.mark_tainted() # Sets SecLevel.tainted_by(component) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- CBlocks should be immutable.
- Naming a variable
Ccmponentand assigning it to aCBlockis confusing. - The cyclic reference here is a bit confusing and invites buggy code. Use
tainted_by(None)instead oftainted_by(self)for the root node.
c = CBlock("user input", sec_level=SecLevel.tained_by(None))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this; tainted_by(None) for root now
docs/dev/taint_analysis.md
Outdated
| component = CBlock("user input") | ||
| component.mark_tainted() # Sets SecLevel.tainted_by(component) | ||
|
|
||
| if component._meta["_security"].is_tainted(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not c.sec_level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defined it as a property and now this works as c.sec_level.is_tainted()
| print(f"Original CBlock is tainted: {not tainted_desc.is_safe()}") | ||
|
|
||
| # Create session | ||
| session = MelleaSession(OllamaModelBackend("llama3.2")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless the example critically depends on using a particular model, always use session = start_session() instead. This makes the examples easier to maintain.
|
|
||
| # The result should be tainted | ||
| print(f"Result is tainted: {not result.is_safe()}") | ||
| if not result.is_safe(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use is_tainted instead of is_safe. The meaning of safe is very ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed all instances of is_safe
mellea/security/core.py
Outdated
| Returns: | ||
| The CBlock or Component that tainted this content, or None | ||
| """ | ||
| if self.level_type == "tainted_by": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially in a module called security.core, we should avoid use of magic strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created SecLevelType enum
mellea/security/core.py
Outdated
| sources.append(action) | ||
|
|
||
| # For Components, check their constituent parts for taint | ||
| if hasattr(action, 'parts'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead us something like:
match action:
case Component...
case CBlock...(If type(action) :> Component then check is not necessary because the Component protocol has a parts() method. )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it to use match/case
Signed-off-by: Ambrish Rawat <[email protected]>
Signed-off-by: Ambrish Rawat <[email protected]>
|
Thanks for the review @nrfulton ! |
Signed-off-by: Ambrish Rawat <[email protected]>
|
hi ambrish! |
Signed-off-by: Ambrish Rawat <[email protected]>
Signed-off-by: Ambrish Rawat <[email protected]>
|
Thanks for the changes; looks good. I'll run the workflows and we'll merge before the last release of the year (which will probably be next Wednesday) |
|
Great, would be good to get this in. |
Signed-off-by: Ambrish Rawat <[email protected]>
Signed-off-by: Ambrish Rawat <[email protected]>
Signed-off-by: Ambrish Rawat <[email protected]>
Signed-off-by: Ambrish Rawat <[email protected]>
nrfulton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some issues with the way that taint information is stored.
Proposal:
- introduce a
TaintAnalysisProtocol withsec_metadata()and/orsec_level()methods. - CBlock, Component, and ModelOutputThunk should implement the
TaintAnalysisprotocol. Use private fields on each to do this, but access those fields through the protocol methods.
After doing this, it will be easier to fix the soundness bug in security/core.py.
I'm also not a huge fan of the local imports as a work-around to circular imports. But we can merge that in for now and I'll clean it up in our refactor, which will probably move these files around anyways.
| # Add security metadata based on taint sources | ||
| from mellea.security import SecLevel, SecurityMetadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to avoid circular imports?
| def instruct( | ||
| self, | ||
| description: str, | ||
| description: str | CBlock, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakelorocco any objections?
|
|
||
| def test_sec_level_none(self): | ||
| """Test SecLevel.none() creates safe level.""" | ||
| from mellea.security.core import SecLevelType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move import to top level.
| except Exception: | ||
| # If parts() fails, continue without it | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this block. The parts() methods are now fully implemented as of v. 0.2.3.
| self, | ||
| value: str | None, | ||
| meta: dict[str, Any] | None = None, | ||
| sec_level: Any = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be sec_level : SecLevel | None?
| if sec_level is not None: | ||
| from mellea.security import SecurityMetadata | ||
|
|
||
| self._meta["_security"] = SecurityMetadata(sec_level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not self.sec_metadata = SecurityMetadata(sec_level)?
| # Check if action has security metadata and is tainted | ||
| if hasattr(action, "_meta") and "_security" in action._meta: | ||
| security_meta = action._meta["_security"] | ||
| if isinstance(security_meta, SecurityMetadata) and security_meta.is_tainted(): | ||
| sources.append(action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use of hasattr feels like a code smell.
| for part in parts: | ||
| if hasattr(part, "_meta") and "_security" in part._meta: | ||
| security_meta = part._meta["_security"] | ||
| if ( | ||
| isinstance(security_meta, SecurityMetadata) | ||
| and security_meta.is_tainted() | ||
| ): | ||
| sources.append(part) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is unsound. Counter-example: consider any case where c.parts() contains elements of type Component and that component has cblocks with tainted data.
This PR introduces a minimal proof-of-concept for taint and security propagation across CBlock, ModelOutputThunk, and session flows, as discussed in generative-computing/mellea#189
.